just describing summary stats, etc.

going to plot correlation to see how these various other inputs are related

Looks like deaths are pretty negatively correlated with winning, deaths seem to also impact the team gaining the first tower, inhibitor, etc.

higher vision score looks like it leads to more assists! cool!

it seems that higher damage dealt to champions/overall correlates with more gold earned!

im surprised something like inhibitor or turret kills doesn't affect the other statistics as much as I would think

I need to split the df up by playerId

In general, I want to do clustering as a part of my unsupervised learning process. in LoL each player is going to be widely different and usually plays particular roles and particular champions. I want to first do some cluster analysis of match statistics on a by-player basis (not including champion or player) to see if these clusters correspond to a winloss outcome, or maybe even a particular champions or types of champions

To do this clustering, I will split the dataframes up by playerId, standardize them, and then perform the clustering. I will do this for each of the playerId's collected in the data and then do one final cluster in a similar fashion with all players combined to see if that cluster can potentially correspond to individual players.

Mildly interesting. I think there's something under the surcface going on here and if I had to guess, it has something to do with the role that the player selects for a match to some extent.

for a given player, it seems that one of the clusters that have been assigned usually lends itself more heavily to one win/loss outcome. take player 1 for example: each match that falls into cluster 1 is predominantly a loss for this player. For player 22, a match in cluster 4 almost always results in a win. In some cases, theres a steep concentration of matches where vision score is very high but damage is low, these could be games where the player selects a support role, or potentially a tank or jungle role that relies on wanting to provide vision on the map. Hard to say if that actually is having impact.

I think theres a lot going on in terms of the dimensionality, and no one variable or group of variables is necessarily dominating outcomes enough to make really sharp clusters

I looked at PCA here just to see if reducing the dimensionality would help me see or recognize any relationships, I originally looked at just two clusters for the PCs. I faceted plots based around the champion played and the win/loss and class, and realised that the win/loss was cutting most of the PCA clusters at a skew in relationship to how kmeans decided the clusters, I then looked at 4 clusters for K means and got something that seems to have some level of interpretability!

PC1 is mainly associated with variables that lend themselves towards a good performance in a match, or being in the action. PC2 in most cases related to either playing map objectives and taking a lot of damage/dying, or just simply playing poorly. When PC1 is high, the clusters that fall in that area end up being mostly wins. When PC2 is high they're mostly losses. it's important to note that player winrate usually sits around ~50%, so it makes sense that there could be another cluster for each win/loss classification that then potentially represents playing bad, and still having your team win the game! the same could go for playing really well (to an extent) and still losing. I think why the separation of clusters isn't an exact '+' shape is because even if you play well and lose, you most likely are not doing as good in relative comparison to the other team who is actually winning and snowballing, etc.

Now I'm going to cluster the whole dataset at once

I don't think the clustering is recovering anything useful

If I would have chosen more exclusive match variables like physical and magical damage dealt, would more separation occur between the clusters?

I don't exactly have much to take away from EDA unfortunately...The most I could do is potentially examine how the pairs of inputs interact and fiddle with that to tune binary classification models.

I feel like unsupervised methods might show promise. Something like SVMs might find utility with my data